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Abstract 



We investigate alternative sampling laws for particle algorithms and the influence of these laws on the 
efficiency of particle approximations of marginal likelihoods in hidden Markov models. Amongst a broad 
class of candidates we characterize the essentially unique family of particle system transition kernels 
which is optimal with respect to an asymptotic-in-time variance growth rate criterion. The sampling 
structure of the algorithm defined by these optimal transitions turns out to be only quite subtly different 
from standard algorithms and yet the fluctuation properties of the estimates it provides are, in some ways, 
dramatically different. The structure of the optimal transition suggests a new class of algorithms, which 
we term "twisted" particle filters, and which we validate with asymptotic analysis of a more traditional 
nature, in the regime where the number of particles tends to infinity. 

1 Introduction 

A hidden Markov model (HMM) is a bi-variate process {(X„, y„) ; n > 0} where the signal process (X„; n > 0) 
is a Markov chain and each observation y„ is conditionally independent of the rest of the bi-variate process 
given Xn- Each X„ is valued in a state-space X endowed with a cr-algebra X , and each y„ is valued in an 
observation space Y endowed with a a-algebra y. This paper is centrally motivated by the task of computing 
the marginal likelihood of an observation sequence (10,^1, • ■ under some assumed probability model for 
the joint process {(X„,y„) ;n > 0}; let /ig and / be respectively a probability distribution and a Markov 
kernel on (X, X), and let 5 be a Markov kernel acting from (X, X) to (Y, y), with g{x, •) admitting a strictly 
positive density, similarly denoted by y), with respect to some dominating a-finite measure. The hidden 
Markov model specified by /io, / and g, is 



In practice one is presented with {Yq, Yi, . . .) but typically does not know which particular /io, / and g specify 
the true law of this process. More broadly, one may not be sure that {Yq^Yi, . . .) are observations from an 
HMM at all, but seeks to compare and fit different models to these data and it is here that the computation 
of the marginal likelihood is especially desirable. 

Let Vt := Y^ be the set of doubly infinite sequences valued in Y. We shall write Yn{ijj) = uj{n) for 
CO = {'^(^)}„gz ^ ^- oj G Q we may take as a recursive definition of the (one-step-ahead) prediction 

filters, the sequence of distributions (ttJ^ ;n>0) following 



Xo^^J.o{■), Yn\{Xo = Xn} ^ g{xo,-), 

Xn\ {X„_i = Xn-l} f (Xn-lr) , ^« | {X, 
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Xn}^g{Xn,-), n > 1. 
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which can, in principle, be computed sequentially as a by-product of the filtering recursion ([2|. Under the 
model ([T]), TT^ is interpreted as the conditional distribution of Xn given Yo.n-i{oj); and Z," is the marginal 
likelihood evaluated at at the point yo:n-i(w). If we write E for expectation with respect to the law of the 
Markov chain (X„) under the assumed model ([T]), i.e. X„|X„_i ^ /(X„_i,-) and Xq ^ /ig, then we have 
the standard identity 

[n-l 

Z^ = E l[g{Xp,Yp{u;)) 

.p=0 



The simplest particle filter, known as the "bootstrap" algorithm [Gordon et al. 1993 , is given below 



Algorithm 1 Bootstrap particle filter 



For n = 0, 

Sample (Co),^i Mo, 
ReportZ|^'^ = 1. 

For n > 1, 



1 ^ 

Report Z-'^ = Z^^ . _ ^ 



iid Ef=i5(CLi,i^«-iH)/(CLi,-) 



Sample (C)^Ii (C-i 



There are now many theoretical analyses of particle filtering algorithms which validate their application. 
Consistency and fiuctuation properties of particle algorithms in the regime N ^ oo are well understood 
Del Moral and Guionnet 1999 Chopin 2004 Kiinsch 2005 Douc and Moulines 2008 ; stability properties 



of particle approximations have been expressed through finite-iV error bounds [Del Moral and Guionnet 2001 



2012 



Cerou et al. 2011 Whiteley 2011 , time-uniform convergence |van Handel, ,2009 and control on A?^ — ^ oo 



asymptotic variance expressions Del Moral and Jacod 2001 Favetto 2012[ Whiteley 2011 Douc et al 



In the present work we conduct a study of Z!^' , and related quantities, in the regime where N is fixed 
and n — > oo. Our overall aim is not just to validate algorithms, but to rigorously address more comparative 
questions of how and why one algorithm may out-perform another, and how it may be possible to modify 
standard algorithms in order to improve performance. Our study is formulated in a generic framework which 
accommodates various standard particle algorithms and novel extensions. Much of the structure we uncover 
is non-standard, and the algorithms we devise are new, so for the sake of an accessible introduction we shall 
now summarize some of our intentions and findings in the context of the bootstrap particle filter as per 
Algorithm [l] (more precise and complete statements will be given in a general setting later) . 

Writing E'^ for expectation with respect to the law of the bootstrap particle filter processing a particular 
fixed observation sequence ui G il, the well known lack-of-bias property [Del Moral, |2004) Chapter 9] reads: 



(4) 



and holds for any n > and A" > 1. The main theme of this paper is exploration and analysis of novel particle 
algorithms arising through changes of measure on the left hand side of Q; we shall consider alternative 
sampling laws which govern the transitions of the particle system as a whole and which generalize the 
sampling recipe in Algorithm [T] The resulting approximations of Z'^ will be of the form 



.JV 



^zt 



■N 



n> 1 



(5) 



where Z"'^ is exactly the same functional of the particles as in Algorithm [T| and {(j)'^'^ ; n > 1} is a sequence 
of functionals chosen such that, if we write E"^ for expectation under the (as yet unspecified) alternative 
sampling law, then the lack of bias property is preserved: 



-N 



^zi 



(6) 
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Our main objective is to identify "good" choices of alternative sampling laws, possibly allowing the 
transitions of the particles to depend on past and/or future observations. Our criterion for performance 
will involve certain fluctuation properties of Z!^'^ ; we shall obtain a mathematically fruitful study of the 
normalized second moment of Zf^'^ , in the regime where N is fixed and n oo, in an w-pathwise fashion. 
This is where the explicit instantiation of i7 will be utilized, and certain ergodicity hypotheses will be 
introduced to facilitate our n — >■ cx) limit theory. 

For now, let us still consider a; € 17 as fixed. Then under the probability law corresponding to Algo- 
rithm [l] the generations of the particle system, Co,Ci,... with C„ :— (C^^, C^) form an X^-valued time- 
inhomogeneous Markov chain. Let {M'^;tt' € f2} be the family of Markov kernels such that for each w G fi, 
M'^ : X X®^ [0, 1] is given by 



(7) 



with X = (a;\...,x^) € and z = {z'^ , z'^) G X^. Let 9 : fl ^ n be the shift operator, {0ij){n) := 
uj{n + 1), n e Z, e il, so that for example YQ{9Ld) — Yi(w). The n-fold iterate of 9 will be written 0" with 
9^^ = Id. It is then clear that the sampling steps of Algorithm [l] implement: 
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^(C„-i,-), n>l. 
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Variance Growth Rates 

We shall study algorithms which involve sampling the particle system under alternatives to ([s]) and which 

for a family of Markov kernels belonging 



in general may be of a very different nature to ( 7 

to a broad class of candidates and which may depend on w in a rather general fashion, but subject to 

M"(x,-)<M"(a;,-), 

and other regularity conditions, we shall consider sampling the particle system according to 

Co-^r, CnlCn-l ^ M^"""(Cn-i,-), n>l, (9) 

and simply setting 

(Cn), n>l. 



'N 



as in we of course achieve 



dM«"-^-(C„-i,-) 

Then letting E"^ denote expectation under the Markov law and with Z!^'^ 

Let be endowed with the product cr-algebra T ~ and let P be a probability measure on J^). 
The requirement to compute Z!^ typically arises when calibrating and comparing HMMs, so it is very 
important that our study accommodates model mis-specification and we stress that P is not necessarily the 
measure on observation sequences induced by the particular HMM ([T|), nor necessarily the measure induced 
by any HMM. Under the assumptions that 9 is P-preserving and ergodic, and under certain other regularity 
conditions, application of our first main result. Proposition |4] establishes, for any fixed N > 1, existence of 

a deterministic constant Tjv(M), depending on M = such that 



-N 



log 



T7v(M), as n — s> oo, for P — a.. 



(10) 



When Tjv(M) exists, it must be the case that Tjv(M) > 0, because variance is non-negative and the 
lack of bias property, (|6|, holds. We shall see that it is typically the case that T7v(M) > 0. 
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Optimal Sampling 

Our second main result (Theorem [T| identifies, for any fixed > 1 and amongst the class of candidates, 
the essentially unique choice of the family which achieves Tjv(M) = 0. It turns out that this 

optimal choice of arises from a particular form of re- weighting applied to each transition kernel 

M'^, and is defined in terms of a family of functions {h'^ : X — >■ K+;a; G Q} which are, in abstract terms, 
generalized eigen-functions associated with algebraic structures underlying the particle algorithm. These 
functions have a particular probabilistic interpretation in the^context of HMMs, which we shall discuss in 
section Theorem [l] establishes that for any > 1, Tjv(M) = if and only if, for P-almost all a; e 
there exists a set e such that is null (with respect to an as yet un-named measure) and for any 



M"(a;,B) = 



/x„ M'^{x,dx')h^'^{x')' 



for all B eX 



(11) 



where 



{x\...,x^) e X 



N 



1 ^ 

-T 

i=l 



h'^ix') e 



Having begun this introductory discussion by stating the algorithmic definition of the bootstrap particle 
filter, then writing the more probabilistic representation ([7])-(|8|, we shall now reverse this programme and 
proceed to write down the algorithm which arises when sampling under a re-weighted Markov transition 



such as (111. In the rare-event and large deviations literatures, the action of re- weighting Markov kernels 



using non-negative eigen-functions is generically referred to as "twisting". Since in the present context we 
axe applying re-weighting to the transitions of the entire particle system, we shall adopt this terminology 
and consider a class of algorithms which we refer to as twisted particle filters. 



Twisted Particle Filters 



The form of the optimal transition (11 1, where M" is re- weighted by an additive, non-negative functional. 



leads us to consider a new class of particle algorithms. Consider a family of functions {tp'^ : X M+; cj e fi} 
and let be defined by 



M'^{x,dx') cx M'^{x,dx')xp'^'^{x'), 



(12) 



with 



i/j" : x = {x\...,x^)eX 



N 



1 ^ 



(13) 



This setup clearly admits the optimal transition (■0'^ = /i") and the standard transition (take = c, for 
some positive constant c), as special cases. Then introducing 



rix) ■.^gix,Yoiu;)) / f{x,dz)r'^iz), 



we observe that 



dM" 



dM«"-^"(C„- 



-iCn 



Since -0" is an additive functional, it is clear that M'^ as per (12) will be of mixture form, and introducing 
the cj-dependent Markov kernel 



r{x,dx') 



/(x,dx')<"(x') 

j^f{x,dz)r-{zy 
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the procedure of sampling from (joj) and evaluating Z!^'^ can be implemented through Algorithm [2] (here Kn 
and An are just some auxiliary random variables employed for algorithmic purposes). 

Algorithm 2 Twisted bootstrap particle filter 
For n = 0, 

Sample (Co),^i Mo, 

Report Zq'^ — 1. 
For n> 1, 

Sample A'„ from the uniform distribution on {1, A^}, 

Sample A„ from the distribution on {!,..., A^} with probabilities proportional to 
Sample CM {A„,i^„, (C_i)ti} ^ ?""(C^i, •) 

Sample (C) |if„,(CUi),=i| ^^Iv VVj ^ ' ^ ^' 

Report = Z^'^l ■ 



In algorithmic terms, the difference between the sampling steps of Algorithm |2] and Algorithm [T] is fairly 
subtle: loosely speaking, at each time step, A — 1 of the particles in Algorithm [2] are propagated by the 
same mechanis m a s in Algorithm [l] However, with an appropriate choice of_!i", the fluctuation properties 
of Z,';''^under ((l2|-([l3]) can be dramatically different to those of Z,^'^under 

Our third main result (Theorem |2| concerns asymptotic fluctuation properties of twisted particle approx- 
imations in the regime where n and uj are fixed and N 00. Under mild regularity conditions, we prove 



central limit theorems for generic particle systems under transitions like ( 12 1-( 13 1. For bounded test functions 
(fi, we flnd that the A — >■ cx) asymptotic variance associated with the empirical averages A^~^/^ J2iLi 'fiiCn) 
is the same when sampled under Algorithms [l|and|| but the asymptotic variance associated with VNZ!^'^ 
sampled Algorithm 111 is in general different to the asymptotic variance associated with ^/NZ!^'^ sampled 
under Algorithm [2j 

The flnite-Af, flnite n behaviour of the relative variance of the standard estimate Z!^'^ from Algorithm 
[1] is well understood. Under certain regularity assumptions, it can be deduced from existing theory for 



standard particle algorithms [Cerou et al. 2011 Theorem 5.1] that in our setting TAr(M) must satisfy 



TAr(M) < log 



c 



N- 1 



(14) 



for some flnite constant C which depends on g and /. Our fourth main result (Proposition [s]) generalizes ( 14 1 
to the case of twisted particle fllters. With Tjv(M) as in (10 1 and M" as in (12 1, application of Proposition 
[5] shows that 



Tjv(M) < log 



N~ 1 



where 



Vp{^P,h) :=: 



ess sup^ 



2 sup - — — - 



sup 



sup 



/i"(z) /i"(z') 



Thus whenever Tjv(M) > 0, by choosing -0 "close" to h so that the oscillations of are controlled, we 

can, in principle, achieve TAr(M) < Tjv(M). 
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Some remarks on connections to the literature are in order. Part of the generalized eigen-value theory 
through which /i" arises (Proposition [2| is a variant on the theme of Kifer's Perron-Frobenius theorem 



for positive operators in a random environment [Kifer 1996 Theorem 3.1], our conditions and technique 
of proof are different. Re- weighted particle transitions like (12 1 are reminiscent of certain eigen- function 
transformations of general-type branching processes studied by Athreya] |2000 , and more broadly can be 



viewed as a randomized version of Doob's /i-transform. To the knowledge of the authors, such transformations 
applied to the type particle systems we consider have not previously been studied, or exploited for algorithmic 
purposes. Lastly, part of the proof of our Theorem [l] generalizes the proof of the necessity part in |Bucklew 



et al. 1990 Theorem 3], to the case of families of kernels driven an ergodic measure preserving transform. 



The rest of the paper is structured as follows. Section |2] introduces our general setting, addresses the 
generalized eigen-value properties of families of non-negative kernels and sampling laws of the particle systems 
we consider. Section |3] narrows attention to a particular class of particle transitions, which we refer to as 
"twisted" and consider some of their properties in the regime where N is fixed, and n -^> oo, and vice-versa. 
Section |4] discusses the application of our main results to sequential importance sampling, bootstrap and 
auxiliary particle filters. Most of the proofs are given in section [5] 



2 Non-negative kernels, sampling particles and variance growth 
2.1 Notation and conventions 

Henceforth we fix two measurable spaces (X, A") and (Y,3^). We are going to investigate a class of particle 
algorithms whose proposal and resampling mechanisms at each algorithmic time step may depend on an 
arbitrarily large number of observations, in quite a general fashion, and we will study properties of these 
algorithms under some assumptions of stationarity and ergodicity of the observation process. To this end 
let n — he the set of doubly infinite sequences valued in Y, let — y®"^ and let Y — {i^n}„gz t>e the 
coordinate process on $7, i.e. = a;(n) for uj ~ {'^(f^)}„gz ^ ^- ^'^'^ : —> be the shift operator 

(Qijj)(n) := u!{n -1-1), n G Z,uj G fl. For n G Z, the n-fold iterate of 9 is written 0" , and we take 0° := Id. 
Let P be a probability measure on (f2, J^). Expectation w.r.t. P will be denoted by E. 

Let A^(X), 7-'(X) and £(X) be respectively the collections of measures, probability measures and bounded, 
real-valued, A'-measurable functions on X. We write 

\\ip\\ := snp\(p{x)\ 

X 

and 

^{tp) -.^ / tp{x)fi{dx), for any ip € C(X), fj, e M{X). (15) 



JX 

Whenever discussing functions whose co-domain is M, we will assume that M is equipped with the Borel 
cr-algebra and all measurability statements regarding such functions will implicitly pertain to this cr-algebra 
on their co-domain. 

We will be dealing throughout with various real-valued functions on 57 x X (and more generally il x X^ 
etc.). For any such function ip, we write the w-section of ip as (p'^ : X — !■ M, ip'^(x) := p>(uj,x). For a function 
^ : $7 — )■ M it will sometimes be convenient to write instead of the more standard ^{uj). 

We will need to express various integration operations involving functions on 17 x X''^ and their w-sections. 



so for completeness we quote the following facts of measure theory, see for example [Doob 1994| Chapter 
VI], which will be used in the sequel in various places without further comment: when ip : U x — > M 
is measurable w.r.t. to (g) , then for every w € f7, the w-section is measurable w.r.t. X®^; and 
furthermore for any ct— finite measure fi on (X^, Af**^) , if (/3 is integrable w.r.t. to P (g) /i, then the function 
acting f7 — > E which maps ut i— ^ fJ'if'^) is measurable w.r.t. T and is P-integrable. 

Let iV > 1 be fixed and let ip, ip be two functions, each acting J7 x X^ M and each measurable w.r.t. 
(E) . We will need to talk about the sets on which two such functions take the same values. For any 
w e 17, let Ai^ := {x G X^ : ip'^{x) = ^"(a;)} and let be a cr— finite measure on (X^ , X®^) . In order to 
avoid having to make the sets {A^;lu G $7} explicit in various statements, we will write by convention: 

for P — a.a. uj, p>'^{x) — tp'^{x), for /i — a.a. x 
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to mean 



'{{u^:^i{AZ)^0})^l. 



2.2 Generalized eigen- value theory for non-negative kernels 

Fix arbitrarily ^0 e ^(X) and let M : QxXxX [0, 1] be such that M{uj,x, •) e ViX) for each {uj,x) e ^xX, 
and A'I{-,-,A) is A'-measurable for each A G X. Then for any w, M{uj,-,-) is a Markov kernel on (X, X) 
and when it is important to emphasize this perspective, we shall often write M"(a;, A) instead of M{uj, x, A). 
We shall adopt similar notation for other kernels. 

For any fixed w G J7, let E"^ denote expectation with respect to the law of the time-inhomogeneous 
Markov chain {X„;n > 0}, with each X„ valued in X, initialized from Xq ~ /ig and X„| = x„_i} ~ 

M^" '^{xn-i, ■), for n > 1. Let G : O x X — >• M+ be a (g) (Y-measurable, strictly positive and bounded 
function. 

Remark 1. This setup is purposefully generic and accommodates, as one particular instance, the case 

G'^(x)^g{x,Yo{uj)), M'^{x,dx') = f{x,dx'), Voj £ n. (16) 
where g and / are as in section [l] and then 



'n-l 

II G 

.p=0 



6"^ 



{Xp) 



Furthermore, with the particular G, M as in ( 16 ), the generic particle system which we will go on to introduce 
in section 2.3 is the bootstrap particle filter discussed in section [l| However, we stress that (16 1 is just one 



instance of our generic setup, others will be discussed in section H] 
We next introduce two hypotheses. 

(HI) The shift operator Q preserves P and is ergodic. 

(H2) There exist constants /3 G [l,oo) , (e_,e+) e (0,oo)^, and v e 7^(X) such that 

G(uj,x) 



Giuj'.x') 



</?, V(w,w',x,x') e f^-" X X^ 



(17) 



'(•) < M (w, a;, •) < e+i^(-): V (cj, x) G r2 x X. 



(18) 



Since in our setup we have taken 51 :— Y^, hypothesis (HI) amounts to saying that the observation 
process is stationary and ergodic. However, the setup := has only been chosen in order to connect 
out general framework with the motivating context of HMMs and thereby provide intuition to the reader; 
in order for (HI) to be meaningful and the arguments in the proofs of our main results to hold, it is enough 
that (O, J^, P) is just some abstract probability space, and that : £7 — > is an invertible transform. 

Hypothesis (H2) is a strong mixing condition, and will rarely hold when X and Y are non-compact. We 
adopt this hypothesis partly for convenience of presentation; in some places in our proofs for the results 



of section 2.2 where we adopt (H2), our arguments will not actually rely on both (17 1 and (18 1 holding 
simultaneously, but we package these conditions together in order to avoid a layer of technical presentation 
which would further lengthen and complicate our proofs, our priority in section [2T2] is to move swiftly through 
these preparatory details and on to the more novel matter which appears later. Some further remarks on 
this issue are suppressed until the end of section |2.2[ 

It is well known that (HI) and (H2) are together more than enough to establish the following result - 
for related ideas in the context of HMMs 



Leroux 1992 



and we include a proof only for purposes of 
exposition: the phenomena it describes and some of the objects involved are the cornerstone for what follows 
in subsequent sections. 
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Proposition 1. Assume (HI) and (H2). Then there exists a constant A G (—00,00) independent oj the 
initial distribution fiQ G 'P(X) such that 



logE'^ 



^ A, 



as n — > 00, 



(19) 



The proof of Proposition [T] given below, can be expressed conveniently using a little further operator-type 
notation. We now introduce the non-negative kernel 

Q : Q X X X A" ^ M+, Q{uj,x,dx') := G{uj,x)M{oj,x,dx'). 

For any fixed uj G il define the operators 

Q^{ip){x) := / Q'^ix,dx')ip{x'), ipeC{X), 
Jx 

AiQ"(-) := j n{dx)Q'^{x,-), neM{X). 
Jx 

and let {Q'^; n G N} be defined inductively by 

Q^:=Id, Q^ = Qr,-,Q'""^, n>l. 
This operator notation allows us to express 



/ioQ^(l) = E'^ 



'n-l 
.p=0 



n> 1. 



(20) 
(21) 

(22) 
(23) 



Proof, (of Proposition fll) . Under (H2) we have g := inf 3; G'^{x) > 0. Now consider the sequence of random 
variables {k"; n > 1} defined by 



From the definition of ( 22 1 it is straightforward to establish by induction the following semigroup property: 
for any w G il, and p,n > 0, 

and this combined with (H2) gives 



K 



p+n-1 = ;^Q^Qr-i(l)5e- 

= .Q-_,g«^-^"Qr-i(i)3e- 



and so 



-log«:^+„<-logK^-log4'''^. 
Furthermore under (H2), sup^ ^ G^{x) < 00, so there exists a finite constant c such that 



- log KjiP(dw) > -cn 



for any n > 1, and these considerations, combined (HI), allow application of Kingman's sub-additive ergodic 
theorem to establish that there exists a constant A G (—00,00) such that 

— logK^ — > A, as 71 — > 00, P— a.s. 
n 

Under (H2) , for any u efl, n>l, x eX and ^ G 'P(X), 

^g-(i) ^ ^Q-g^i(i) ^ e+ 

A'g^(i) mQ"Q^i(i) - 
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and combining this with a lower bound of a similar form we find 



sup 



-iog<--iogA^g^(i) 

71 n 



1 



1 



< - log +- log(e_5) 



\ogfiQ'^{l) — > A, as n — > oo 



a.s. 



The proof is complete upon noting (p3 



□ 



It turns out that Proposition [T] is one element of a generalized eigen-value theory for the non-negative 
kernel Q. Under (HI) and (H2), Proposition |2] below, directly inspired by Kifer, 1996 Theorem 3.1], 
identifies a function /i : x X — >■ R+ and a random variable A : il x X — >■ M_|_ satisfying 



As we shall go on to see, the expected value of log A under P is equal to the constant A appearing in 
Proposition [T] and h plays a special role in the construction of efficient particle algorithms. 
Now let -.ViX) ^ V{X) be defined by 



e 7'(X), 



mQ-(i)^ 

and let {^f^;n G N} be the family of operators defined inductively by 



<S>^ -.^ Id, "o$5^_i, 



so that each <E>!;^ acts P(X) — ^ 7'(X). Under these definitions, for any n G N, 



(24) 



which can be verified by induction, since from the above definitions <i>Q = Id, Qg := Id and when (24 1 holds, 

<&-(/i)Q''""(l) mQ-Q''"-(1) mQ^+i(1)' 



*"+i(/i)= '^o*:^ (a^) = 



Remark 2. In the setting M'^ {x, dx') :— f{x,dx'), G"(a;) := f/(a;, y(a;)), then if /ip and ttJ^ are respectively 
the initial distribution and prediction-filter as in ([2]), we have 



'ri+l 



«), n>0. 



Remark 3. Under the uniform bounds of (H2), it is known that $^ is exponentially stable with respect to 
initial conditions [e.g., |Del Moral 2004 Chapter 4] in the sense there exists constants C < oo and p < 1 
such that for any G >C(X) and any n > 1, 



sup sup |[$-(^)-$-(^')](V')l < MCp^ 



(25) 



(25 1 is used extensively in the proof of the following proposition, given in section |5] 



Proposition 2. Assume (H2). 
1) Fix fi G 'P(X). Then the limits 



It — ?cx 

h{uj,x) :— lim 



lim < ""(m)(^), ojen, AeX, 

t-4-OO 



1m)Q^(i)^ 



UJ € fl, X G X, 



(26) 
(27) 
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exist and define a family of probability measures rj := G P(X);aj G fl}, and an X -measurable function 
h-.nxX^R. 

2) In fact, rj and h are independent of the particular fj, chosen in part 1) and there exist constants C < oo 
and p < 1 such that for any (p G C{X), 



sup sup 

weope-p(x) 



and 



3) X : uj en 



sup sup sup 



M <||^||Cp", n>l, 



<Cp'\ n>l. 



7f{G'^) G is measurable w.r.t. T and we have the bounds 
Xuj h{u},x) 



sup — — < oo, 



sup 



< oo. 



(28) 



(29) 



(30) 



4-) Amongst all triples which consist of: (i) an ^.-indexed family of probability measures on (XjAf), (ii) an 
M.^-valued, not identically zero, measurable function on x X, and (Hi) a measurable function on fl; the 
triple {rj, h, X), with rj, h as in part 1) and X as in part 3), uniquely satisfies the system of equations 



X,^,h'^ 



rfO^) = 1, for all lu e n. 



(31) 



The connection with Proposition [T] is: 
Proposition 3. Assume (HI), (H2) and let A be is as in Proposition^and X be as in Proposition^ Then 



A = E [log A] 



h'^{x) 



-P{duj), for any a; G X. 



(32) 



In the setting of HMMs as per Rem ark [l| equahties like the first one in ( 32 1 appear routinely in the study 
of likelihood-based estimators |Leroux 1992 Douc and MouUnes 2011 . However, it is the second equality 
in ( 32 ) , and generalizations thereof, which shall be crucial for our purposes in the sequel. 



Remark 4. If one weakens the "1-step" condition ([18| to an m-step version for some m > 1, then Propositions 
l|3]can easily be generalized, since one can simply work with the kernel Q':^ instead of Q'^ . The utility of 
imposing the uniform in ui and x bounds in (H2) is that the statements of Proposition [2] hold for all uj € fl, 
and without having to invoke (HI). If one allows cj-dependent constants and measure s in ^Tf^ and (18 1, but 
instead imposes certain explicit compactness continuity assumptions and (HI), then fKifer[ 1996 Theorem 
3.1] provides a partial alternative to our Proposition [2j 

We shall proceed by introducing the laws of the particle systems of interest. 



2.3 Law of the standard particle system 

Unless stated otherwise, in this section we fix arbitrarily > 1 and write ViXJ^) for the collection of 
probability measures on (X^, A''^^). 

Let M : Q X X^ X A"*^ [0, 1] be given, in integral form, by 

N 

M.{Lo,x,dz) = Yl 

1=1 

where x = {x^ , ...,x^), z = {z^ , z^) G X^. Each member of the family {M";a;G il} is a Markov 
transition kernel for the entire A^-particle system according to a "multinomial" resampling scheme with 
fitness function G{uj, •), followed by conditionally independent mutation according to M'^ . 

Now for any given w G fi, we shall denote by expectation with respect to the law of the Markov chain 
{Cn'jiT' ^ 0}, with each C„ = {C^, ...,C,n} valued in X^ and 

Co ~ Pf'', C«ICn-i - M«""'-(C„-i, •)• (34) 



J2j=iGiio,x^)M{Lj,x^,dz') 



(33) 
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Define 



where x = (x^, 



G : {uj,x) enxX 



N 



1 ^ 



Remark 5. For any if G >C(X), if we define tlie function 

¥.:x = (a;\...,a;^)eX^ 



1 ^ 



1=1 



tlren tire well known lack-of-bias property of the particle approximation reads as 

"ti-1 



-AT 



nG'"'"(Cp)^(c„) 

.p=0 



.p=0 



(35) 



Remark 6. When M"(a;,-) = f{x,-) and G'^{x) — g{x,Yi^{ijj)), the sampling recipe for simulating the 
process {C,n',n > 0} according to (34l is the bootstrap particle filter: algorithm[l] Furthermore, the particle 
approximation of is then given by 

n G^^"(Cp)- 

p=0 



Part of our investigation will be to develop some limit theory for 



-N 



(36) 



when N is fixed and n — >■ oo. Our notation G, M and the display (35) are intended to hint that the type 
of arguments employed in the proof of Proposition [T] and the phenomena described in Proposition [2] may be 
relevant to the study of (36 1. Indeed this is the direction in which we are heading. However we will actually 
study an object more general than (36 1, arising from a more general form of particle approximation, which 
we now introduce. The generality arises as we consider the particle system distributed according to some 
Markov law, possibly different to (34 1. 

2.4 Alternative sampling of the particle system 

Let let us introduce M : Q x x A"^^ — > [0, 1], possibly different from M. For fixed w, now denote by E*^ 
expectation with respect to law of the Markov chain 



so ~ Mo ! 



(37) 



We are going to specify a class of candidates for M, and we first notice that the regularity condition (H2) 
transfers to G,M in the following sense (the proof is elementary): 

Lemma 1. Assume (H2). Then for any N > 1, 

^^</3, V(u;,c.',a;,x')ef^xX2^, 



We shall consider the following family of kernels. 
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Definition. (M) Any M : Q x x X^^ [0, 1] is a member of M if and only if there exist constants 
(e_, e+) e (0, cof &Yidv (X^) such that 



V (•) e_ < M(w, X, •) < t+v (•) , V (w, x) e 17 X X- 

where v is as in (H2). 

When M is a member of M we write 



JV 



v®^ < V, and 



X" 



(x) I v (ax) < oo, 



dv 



iLot i\ dM"(a;,-) , '\ ^ o „ v2-/v 

(j) [x,x):=^^ (a;), (w, x, x ) G i2 x X , 



and in the context of sampHng the particle system (C„; n > 0) under the law (37), we will take 

nG«^-(Cp)/''-(Cp,cp+i) 

p=0 



(38) 



(39) 



(40) 



as an approximation of E'' 



Clearly (40 1 is unbiased under (371, in the sense that 



-N 



nG''''"(Cp)/''"(Cp,cp+i) 

.p=0 



-JV 



n-l 
.p=0 



n-l 
.p=0 



where the first equality is immediate under (37 1 and (40 1, and the second equality is the lack-of-bias property 
as per Remark [5] The following result describes the n oo behaviour of 



V, 



-N 



n,N 



(41) 



Its proof, provided in section jsj starts by considering the family of kernels |R";a; £ with 

R"(x,dx') := G"(x)20"(x,x')^M"(x,dx'), 



in terms of which the numerator of (41 1 may be written, and which exhibit exactly similar properties to 
those of appearing in the proof of Proposition ([T]) . 

Proposition 4. Assume (HI), (H2) and fix N > 1 arbitrarily. For every M € M there exists a constant 
Tjv(M) e [0,oo), independent of the initial distribution fj,Q such that 



1 



logV,"^ — >Tjv(M), as n^oo, V - a.s. 



We now proceed to address the question of how TAr(M) depends on M. To this end, let us introduce 
two further pieces of notation: 

Q{uj,x,dx') :— G(a;, x)M(a;, X, dx'), 
and when (H2) holds, so that ft. as in Proposition |2] is well-defined, consider the function 

1 ^ 

n : [uj, xj e nxX^ I — > — ^ft(w,x*) e M+. 



(42) 



1=1 



Our interest in ( 42 1 stems from the following pivotal lemma, which shows how the eigen-function h and 
eigen- value A of Q appearing in Proposition [2] define an eigen-function and eigen- value for Q. Recall that 
> 1 is still arbitrarily fixed. 
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Lemma 2. For any uj G fl, 

where is as in Proposition |2] 
Proof. 



1 ^ r 



fe=i 



fc=l 4 = 1 •'^ 

A„-E'^"(^') = ^-h"(x). 



TV 



□ 



Now consider taking 



which is clearly a member of M, due to the definition of h and part 4) of Proposition [2] In this case we have 
G [Qp)(p (Cp,Cp+ij = [Qp) he^^+^^ ) 



p=0 p=0 



1 Q^-" (h^-^--) iCp) 



^ #f)n>- <-) 

where the final equality is due to Lemma [2] Thus if we choose M as per (43 1, then the quantity in (44 1 
depends on the particle system trajectory Co, Cn only through the quantities h"(Co) and '^(Cn), and we 
then might hope that T^vCM) = 0. This turns out to be true, and much more strikingly, up to its definition 
on certain sets of measure zero, M as in (43 1 is the unique member of M which achieves TAr(M) = 0, in the 
sense of the following theorem. 

Theorem 1. Assume (HI), (H2), let N > 1 be fixed arbitrarily and let M be any member of M. Then 
l)-3) are equivalent. 

1) T^(M) = 0. 

2) For P-almost all uj e ft there exists e X^^ such that 1^'^'^ {A^) = and for any x G A^, 

X, B) - , \ for all B e X^^ . 45 



3) For V -almost all a; G fi, sup„ V^^n < 
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The proof of Proposition |4] can be found in section |5] Showing 2)^3) amounts to httle more than 
equation (44). It is obvious that 3)=>1). Showing 1)=>2) is the challenging part. 

The following Lemma serves to accompany Proposition |4]_and Theorem [l] and provides necessary and 
sufficient conditions for Tjv(M) = in the case of taking M — M. i.e. the transitions of the standard 
particle system. The proof is given in section |5] 

Lemma 3. Assume (HI), (H2) and let N > 1 be fixed arbitrarily. Then 1)-S) are equivalent. 

1) T^(M) = 

2) For P - a.a. uj, h'^{x) = 1, for v - a.a. x. 

3) There exists a random variable C : — > M+ such that 

for P — a.a. cj, G"(x) — C^, for v — a.a. x. 
In situations of practical interest, point 3) of Lemma[3]is usually false, and then it must be the case that 



Tjv(M) > 0. It then appears that a choice of M which approximates the optimal transition, (45 1, may yield 
a provable performance advantage over M, in the sense of achieving strict inequality T7v(M) < Tjv(M). 
This leads us to consider the class of particle algorithms treated in the next section. 

3 Twisted particle algorithms 



The form of the optimal transition kernel ( 45 1 suggests that we consider families of kernels arising from 
re-weighting of M'^(a;,-) by an additive, non-negative functional. In this section we will analyze particle 
algorithms arising from kernels of this general form. Let i/) : f2 x X — > M+ be a strictly positive, bounded 
and measurable function and then define 



1 ^ 



i=l 

For the purposes of this section, let us consider the following mild regularity assumption. 
(H3) For each oj E fl, sup^. G'^{x) < oo and sup^. ip'^{x) < oo. 

When (H3) holds the following Markov kernel is well-defined. 



M"(a:,dx')V>^"(x') 



M-ix,dx^)^ r::: 'r.x^ - w 



We shall analyze particle approximations which arise from sampling under ( 46 1 . Our motivation here is that 



we have in mind situations where t/j is chosen to be some approximation of h, assuming the latter exists. 



The kernel (46 1 accommodates the standard transition (33 1 (e.g. take ijj'^ — 1) and the optimal transition 
identified in Theorem [l] (take V'" = h^)- 

This section addresses two main objectives. Firstly, to validate the particle approximations delivered 
when sampling under ( |46[ ) , by analyzing some of their convergence and fiuctuation properties in the regime 
where N — > oo. Secondly, to provide an estimate of T7v(M) which exhibits dependence on N and on the 
discrepancy between V''^ and h'^. 

Let us introduce a little more notation. Define, for each cj G f2, 

l^o■.= ^io, 7?:^ n>l. 
$-:7'(X)^P(X), $-(M)(dx) M e 7'(X). 
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and also for each > 1, 

For purposes of developing limits and fluctuation studies in the regime — >■ oo, it is convenient to 
construct the particle system of interest as follows. 

Let Kjv be set of all bijections between {1,...,A^} and itself, and let its power set be ICn- Let := 

(X^ X Kjv)'*' be the set of infinite sequences valued in X^ x Kn, endow it with the cr-algebra Zn := 

(^X <Si IC n)'^^ , so as to form a measurable space (Z]^,Zn). Let | ^Cm '^n^ ; > o| be the coordinate 

process on Zjv 
We shall write 

1 ^ 

1=1 



n > 1. 



_ 1 ^ (e-i) (^'^"") 

n-l 
n-l 

7o — inm — VnmYYvp{G j^^^i, n>i. 

Now for some fixed cj S , let us introduce a probability measure, P"^, on {Zn,Zn) according to the 
following prescription. Under P"^, the sequences |Cn; ?t- > o| and n > 0} are independent. The sequence 

|Cn;"- > o| is a Markov chain with the following law. At time n = 0, the random variables |Co| . 

independent and identically distributed according to /ig- At time n > 1, the random variables |Cn| . 
conditionally independent given Cn-i, and 

C^JC„-i ^ $'"""(^-1), CJCn-i-*'"""(e-i), ^^2,....,N. (47) 

Finally, the components of the sequence {Kn',n > 0} are independent and identically distributed according 
to the uniform distribution on (K^Tj/Cjv)- Expectation under P"^ will be denoted by E"^. 

Remark 7. Our interest in this construction is that if we define, for each n > and i = 1, A^, the random 
variables 

/-i ?K„(i) 
Sn ■ Sn 

we obviously have the identity of empirical measures 

1 ^ 



N 

are 

_ 1 

N 

are 



Vn ^ 

i=l 



J.E<5c„ n>0. (48) 



Furthermore, using the fact that with M as in (46 1, for any (w,A) G ft x X, 'M.'^{x,A) is invariant to 
permutations of the coordinates x = {x^, ...,x^), it is straightforward to check that under P"^, the process 
{CnjTT' ^ 0} is Markov, with 

C„|C„-i - M«""'^(C„_i,-), n>l, (49) 

which is exactly the transition law of interest. The important thing here is that the identity of random 
measures (48 1 permits us, by construction, to perform asymptotic analysis of functionals of the empirical 
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measures ^ J2iLi ^C' ' — ^} through study of the random variables |c„; n > o| rather than {(„', n > 0}: 
the conditional independence structure of the former as per ( 47 1 is easier to work with than that of the latter 



under (49 1 



To connect with the notation of previous sections, we note that under the above definitions, and for M 



as in ( 46 ) , we have 



^ (^i:c»-"(c.))""5-'^'""""'*'""'«"-' 



= Cb' "(Cn-l,Cn), 

where in the final display, is as in (|39| . We then have 



p=o p=o Z^i=i y ^"5^+1^ 



Introducing the Markov kernel. 



M'^{x,dx') cx M"(a;,dx')V' (a;') 



an algorithmic procedure for sampling the particle system according to (49 1. Here Kn and An are just some 
auxiliary random variables introduced for algorithmic convenience. 



Algorithm 3 Twisted particle algorithm 
For n — 0, 

Sample (Co),^i Mo 
For n> 1, 

Sample from the uniform distribution on {!,..., N} 

Sample An from the distribution on {!,..., N} with probabilities proportional to 

Sample CM {A„,i^„, (C_i)ti} ^ M«""(C;f"i, •) 
Sample (C),^^^J{if„,(a_Or=i} 



3.1 Analysis for N ^ oo 



Lemma 4. Assume (113). Then for each a; G fi, n > 1, /i G 'P(X) and p > 1 there exist finite constants 
Bnand C^p such that for any ip G 'C(X), 



[r?^,n(A^) - (^)|<ll¥'ll 



N 



-N 



N 



(50) 
(51) 



From this Lp estimate it follows by a standard Borel-Cantelli argument that 
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Corollary 1. For each, n > 0, w G and G /3(X), 



(52) 
(53) 



almost surely, as N 



Lemmajsjis an extension of the CLT of Del Moral [2004 Chapter 9, Theorem 9.3.1] to the case of random 
test functions and the setting of the twisted particle system. It is established by an appHcation of Jacod and 
Shiryaev] (2002| Theorem 3.33, p. 478] 



For d > 1 and m > 1 let ip : {p, q, x) (z N x {1, ...,m} x X ipp g{x) e M be a bounded measurable 



function and for each N > 1 let f3 



N 



{(3pq', (p, <z) G N X {1, ...,m}} be a collection of random variables 
on (Zjv, Zjv, Pat) such that for any p > 1 and 1 < q < m, /3^g is measurable w.r.t. (t(Co, Cp-i) and 

f^ix) e E'' be 



defined by 



i}} are constants. Then let the random function : (p, x) S N x X 

m 

f^{x) :=^/3>p,,(x), 



and for n > define 



p=0 

with the convention that (^-i) i'P) = Mo- Clearly {Af^'^(/^);n > 0} is an M'^-valued martingale 

w.r.t. to the natural filtration of |Cn;" ^ o|- 

Lemma 5. Assume (H3) and that there exist deterministic and finite constants {Pp^q, {p, g) G N x {1, .... m}) 
such that for each p, q, 

C ^ Pp,i (54) 

in probability as N oo. Then with 

rn 
q=l 

for any fixed uj ^ fl, the W^-valued martingale | ViVAf^'"'^(/^); n > o| converges in law to an M.'^-valued 
Gaussian martingale {Af^ (/);?! > 0} such that for any 1 < i, j < d, 

n 

Vn > 0, (M-(r),M-(/^)),^ = [(/; - 77-(/;))(/^- - ry-(/^-))] . 

p=0 



Now define 



^tj/ , Q dx') -^uj Q'^Q^'^ ...Q^ 



and notice that 

IJ-oQn = Vn- 

Theorem 2. Assume (H3). Then for any n > 0, uj G il and ip G C{X), 

^[Vn{f)-V^iv)] AA(0,<J^)), (55) 

y^m^''{'p)-iM => m,'^nj^)), (56) 

as N ^ 00 where 
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n r 



p=0 



and 



p=0 



(57) 



(58) 



Remark 8. The asymptotic variance expression (57 1 is independent of the the particular choice of -0 (subject 



to (H3) of course), and is thus exactly the same expression obtained in the CLT for the standard particles 



system (i.e. tp constant) see for example Del Moral 2004 Proposition 9.4.2]. However, the asymptotic 



variance in ( 58 1 clearly does depend on in general. 



3.2 Analysis for n — )■ oo 



For M as in (46), we obtain an estimate of TAr(M) which exhibits its dependence on N and the discrepancy 



between ij] and h. The proof is given in section [5] 
Proposition 5. Assume (HI), (H2) and 



sup , ,, < oo. 
.u.'L" (a;') 



Then for any N > 2, 



T^(M) < log 



iV- 1 



Vri^P, h) 



whe 



'Dp{Tp, h) := P — ess sup^ 



2 sup 



sup 



sup 



tp'^iz) ^"(z') 



4 Discussion 

4.1 Sequential Importance Sampling 

Notice that in the case = 1, we have the identity = A/". This observation leads us to consider 
how our results can be applied to analyze the variance growth behaviour of Sequential Importance Sampling 
(SIS), in the following sense. 

Let AI : 57 X X X A" [0, 1] be a Markov kernel, and for some L > 1 and any fixed uj G ft, let 
{Xj^; n > O}^ be L iid time-inhomogeneous Markov chains, each with law 



Xq ^ fJ-O, Xl\X-n — 

Now to connect with the setting of sections 



n>\. 



2.3 



and 



2.4 



continue with = 1, and take M :— M. We 



shall assume that (HI) and (H2) hold and that M is a member of M. The quantity 



1 ^ 



i=l lp=0 



(59) 



KZoG'^-ix;) 



is clearly an unbiased estimator of E"^ 
independent, for any fixed a;, the relative variance of (|59| is 

1 



Furthermore, since the L Markov chains are 



L 



Ki + 1] 



(60) 
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where V!^ i is as in (41 1. By application of Theorem [TJ we find that except when (up to the sets of measure 
zero mentioned in Theorem |l| Af"(x, dx') oc M{x,dx')h^'^{x'), we have the P-ahnost-sure convergence: 



logV:^.i^Ti(Af) >0, 



1 



and supposing that now L — L{n), the leading term in ( |60[ ) P-almost-surely behaves according to: 

lim inf - log ( - V,^ 1 ) = Ti (M) - lim sup - log L(n). 

n^oo n \L ' / n~>oo n 



(61) 



(62) 



In summary, whenever Ti(M) > 0, it is necessary that L{n) must be scaled up exponentially in n in order 
to prevent exponential growth of the relative variance of ( 59 1 . In this sense the SIS approach is typically an 

inefficient method for approximating Y[p=o ^^'''^(-^p) 



shall now discuss. 



at least relative to particle methods, which we 



4.2 The bootstrap particle filter 



In the case 



G{uj, x) := g{x, lo(a;)), M{uj, x, dx') := /(x, dx'), 



(63) 



we have that {M"; w G il} is the collection of the transitions of the bootstrap particle filter, as described in 
the introduction. When (HI) and (H2) hold, by Lemmajsjwe find that in this scenario, for any iV > 1, 

T7v(M) = if and only if for P — a.a.Lo, e X s.t. v {A^^) — and g{x, Yq{ijj)) is constant on A^. 

The condition of g{x ,Yfj{ijj)) being constant in x represents an entirely degenerate HMM in which the 
observations do not provide any information about the hidden state. Thus we concentrate on the situation 
Tjv(M) > 0. By an application of Proposition [s] in the case that ip'^{x) = 1 for all lo and x. and using the 
bound ( 30 1 of Proposition [2] we find that there exists a constant c < oo such that 



.N 



Tn (M) < log 



1 



N - 1 



(64) 



where the convergence holds P-almost surely. The practical importance of this result is that it shows why 
even the rather basic bootstrap filter is to be preferred over the SIS method in terms of variance-growth 



behaviour, as seen by comparing (64 1 with (62 1 



It should be noted that under our assumptions (HI) and (H2), the bound ( [64| is implied by the bounds 
of Cerou at al. |2011[ Theorem 5.1]. The results of Cerou et al. [2011 provide important information about 
the non-asymptotic- in-n behaviour of the relative variance, which our Proposition [5] does not. On the other 
hand. Proposition [5] applies not just to the standard particle transition M, but also to twisted transitions, 
to which the analysis of Cerou et al. 2011] does not extend. We shall now discuss these twisted transitions. 



Continuing with the setting (63 1, and assuming that (HI) and (H2) hold, we shall now provide some 
interpretation of h. The objects appearing in part 1) of Proposition [2] have the following interpretations: 
'^n "(a*) is the one-step-ahead prediction filter initialized at time —n using /x, and run forward to time 
zero, thus conditioning on the observations y_„(a;), y_i(w). Let us write this prediction filter proba- 
bility measure as = $^ "^{^j). The quantity Q'1^{l){x) is the conditional likelihood of observations 
l^(w), ...,y„_i(a;) given that the hidden state in the HMM at time zero is x. Thus if we denote by IV^ the 
probability measure given by 



n-(A) := 



S^P^{dx)Q-Jl){x) 



AeX, 



/xP-(dz)g^(i)(z)' 

we find that by inspection of part 1) of Proposition |2] that h can be interpreted as the point- wise limit 

dW 



h{uj, x) 



lim ^(x). 

n^oo dP!^ ^ ' 



(65) 
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Moreover, by part 2) of Proposition |2] we find that 



sup 



dm. 



{x) — h{u!, x) 



(66) 



for some constants C < oo and p £ (0, 1). 

Let us now consider a twisted bootstrap particle filter (as per the Section [ij, in the case that for s ome 
fixed £ > 1, we take i/j" :— dIVf /dPf , and as an instance of the setup in section|3] we let M be as per (46l 
with this choice of V'"- Although typically unavailable in practice, this ij}'^ allows an illustrative application 
of Proposition [5] Indeed, using (66 1, and the fact that under the bounds of part 3) of Proposition |2]/i (a;, x) is 
uniformly bounded above and below away from zero, elementary manipulations show that there exists some 
finite constant C" < 00 such that 



Tat (m) 



) < log 








N -l_ 



(67) 



We see that, in principle, increasing the lag length t is useful in helping to control Tjv 

Now under the mild regularity condition (H3), for fixed u) and n, and tp a bounded measurable function 
on X, Corollary [1] shows that for the twisted particle filter, 



N- 



N 
i=l 



^(C)-<(^)^o 



(68) 



as — > 00, with probability one, independently of tj}. Furthermore, by Theorem[2[ N^^^^ 12iLi [v(Cn) ~ ""n ('y') 
converges in distribution to a centered Gaussian random variable with variance independent of ^p, i.e. the 
same asymptotic variance obtained under the standard bootstrap particle filter. 



4.3 Auxiliary particle filters 

In addition to the ingredients of the HMM given in section [T] introduce r : x X — > such that for each 
UJ, r"(a;) is strictly positive and bounded in x. Then set: 



G-{x) := 



g{x,Yo{oj))J^r''^{z)f{x,dz) 
r"(a;) 



M-^ix^dx') oc f{x,dx'y'^{x'). 



(69) 



In this case, sampling according to the corresponding M", we obtain a form of Auxiliary Particle Filter 
(APF) |Pitt and Shephard[ |1999[ [Johansen and Doucet[ |2008[ |Douc et al^ |2009| . More specifically, let 
{p,Q e 'P{X)]uj e i}} be the family of probability measures such that /Xq (dx) oc {x) p,o{dx) , where po is the 
initial distribution in the HMM, as in section [l] Then sampling 



Co ~ (m;^) 

it is straightforward to check that 



Cri|Cn-l 



(70) 



(71) 



is an unbiased estimator of Z^, at least whenever the expectation of (|71j) is finite. If, for example, r is 
bounded below away from zero, and (HI) and (H2) hold, then Proposition |4] may be applied to establish 
the existence of Tjv(M) > such that the following convergence holds P-almost surely: 



logE' 



N 



logZ^^^T^(M), 



since the po{r'^) and N ['^^ "(Cn)] ^ terms have no asymptotic contribution and since the conver- 

gence in Proposition |4] is independent of the distribution from which the particle system is initialized. If 
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we choose r{uj,x) := g{x,Y{){u])) we obtain the "fully adapted" APF of [Pitt and Shephard [1999 . Lemma 
|3]then shows that TAr(M) = if and only if P-almost all lo, Jy^g{z,Yi{uj))f{x,dz) is z/-almost everywhere 
a constant. The latter condition usually does not hold in situations of practical interest. One may follow 
the generic recipe of section [3] and derive a twisted version of this full-adapted APF , but let us consider a 
different choice of r. Under the now familiar regularity conditions, the 57-indexed family of kernels given by 



g{x,Yo{uj))f{x,dx') 



(72) 



have an associated eigen-function given by the point- wise limit on the right hand side of ( 65 1 
choose 

r{uj,x) lim -Tf^{x), 



so that 



g{x,Yo{uj))f{x,dx'y^{x') = x^r^ix) 



Let us then 



(73) 



for a non-negative random variable x which is the eigen- value for the family of kernels ( 72 ) . In this case 
we find that the given in (69) is constant in x and Lemma [s] then shows that TAr(M) = 0, and the 
eigen-function of Q'^ is essentially constant. The interpretation of this result is that the variance growth rate 
behaviour of (71), with the particle system sampled according to the APF with r given by (|73 



cannot be 
improved upon by twisting. 

Now let us consider the empirical measures obtained under the APF, for some general choice of r. If one 
wishes to use N X^ili '^C* approximate the prediction filter tt^ in a — oo consistent manner then, 
in general and in contrast to (|68[ ), some re-weighting must be applied. For example, assuming (H3) holds 
with G as in ( p9| ). Corollary [IJ and some elementary manipulations involving the 
for bounded measurable tp, 



semi-group show that 



(CA) 



0, 



(74) 



as — cx), with probability 1 under the law of the particle system specified by (69l-(70l. Of course, 
(74) can be verified using the results of Johansen and Doucet [2008 , Douc et al. [2009 . However, the new 
perspective which our results bring here is that, as we have seen in (67) and (68), the twisted bootstrap 
particle filter can in theory achieve Tjv(M) ~ whilst not requiring re- weighting of particles to obtain 
consistent approximations of ttJ^ . 



4.4 Numerical illustrations 

In order to give some impression of the practical performance of the algorithms we have analyzed, we now 
present some numerical findings. (H2) is not satisfied for the M and G which specify the modelsbelow; 
in this section some of our theoretical findings can only be used as guidelines for the design of practical 
algorithms. We note however, that the much milder regularity condition (H3) is satisfied for the models we 
consider and thus Corollary [T] and Theorem [2] apply to the particle systems in question. 

We shall first consider the infiuence of ip on the variance-growth behaviour of the twisted bootstrap 
particle filter (henceforth TPF). The purpose of this example is to illustrate an idealized scenario in which 
■0" := dlif/dPf can be computed exactly. Consider a linear-Gaussian state-space model where X„ = 
0.9X„_i -I- Vn, Yn — Xn + Wn whcrc (Vn) , (Wn) are iid zero mean Gaussian sequences. We shall consider a 
TPF with 0" — o?n^ /dP^ for some £ > 0, which can be computed in closed form for this model. Note that 
the TPF algorithm can be implemented with ip'^ only known up to a constant of proportionality. Figure [T] 
shows variance growth behaviour estimated empirically using 2000 independent runs of the algorithm for a 
single observation sequence, which was drawn from the model and then fixed. Convergence of log V"^ 
is apparent and the infiuence of £ on the rate of variance growth is substantial. 

We now turn to a standard stochastic volatility model, in which dH'^ /dP^ is unavailable in closed form, 
but for which a standard deterministic (henceforth: "the") approximation is available. For details of the 



model, this approximation and a real data of daily returns on pound/dollar exchange rates see Doucet 



et al. 2006 and references therein. We tested the TPF and APF using this data set and the same model 



21 




Figure 1: Linear-Gaussian model and the TPF. Left: estimated values of V^jy against n. Right: estimated 
values of n^^logV^jy against n. In both cases N — 100 and u is fixed. Lag-lengths: red, £ — (i.e. the 
bootstrap filter); yellow/ = 1; green, £ = 2; cyan, £ = 5. The £ = plot is omitted from the right-hand figure 
due scale constraints. To the precision of these figures, increasing the lag beyond £ = 5 had no noticeable 
infiuence on the variance. 




Figure 2: Stochastic Volatility. Estimated values of V^jy against n, with N ~ 1000 and ut fixed. Left: TPF. 
Right: APF. Red, £ ~ (the bootstrap algorithm), yellow, £ ^ 1, cyan, £ ~ 2, blue, £ — 5; green, £ — 10; 
violet, £ = 50. 
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Figure 3: Stochastic Volatility model. Left: estimated for the TPF against n for I ~ h .uj fixed and: 

red, N = 10; yellow, N = 100; green N = 1000. Right: empirical variance of particle approximations of the 
mean of ttJ^ against n , with N = 1000 and w fixed. Cyan, TPF; red, APF. For both algorithms, £ = 5. 



parameter settings as in the aforementioned paper. We took both ijj'^ (for the twisted bootstrap filter) and 
r"^ (for the APF as in (|69|) ) to both be the approximation of dW^/dP^. 



Figure |2] shows empirical variance-growth behaviour for a range of values of £, estimated from 2000 
independent runs of each algorithm. For both algorithms, increasing £ appears to generally yield a decrease 
in variance. The figures indicate that apart from occasional fiuctuations, the APF mostly exhibits lower 
variance than the TPF, however, we found this phenomenon to be dependent on model parameter settings, 
for other parameter values we found the TPF exhibited lower variance than the APF (not shown). 

The left plot in Figure [3] illustrates how the variance of the estimates from the TPF varies with N. An 
increase in variance growth rate is evident as is decreased. The right plot of Figure |3] shows the empirical 
variance of particle estimates of the mean of the prediction filter tt":^ against n, obtained from the TPF and 
APF both with £ ^ 5. It is notable that here the TPF generally exhibits lower variance than the APF. 
Results for the standard bootstrap particle filter were found to be identical to those for the TPF on the 
scale of this figure, this is in agreement with the conclusions of Theorem |2] applied to the TPF - that the 
asymptotic variance of predicition filter approximations is independent of -ip. 



4.5 Generalizations and extensions 

• We have only mentioned the multinomial resampling scheme, appearing implicitly in the definition of 



M given in (33). Several alternative schemes are popular in practice. In order to develop extensions of 
Proposition|4jand Theorem[l]to alternative schemes (assuming resampling is applied at every time step, 
and with a fixed number of particles) it suffices to re-define M appropriately so that it incorporates 
the resampling scheme of interest, check that the conditions in the statement of Lemma[l]are satisfied, 
and check that Lemma [2] holds with Q re-defined in terms of this new M. Of course, the new M will 
influence the form of the corresponding twisted algorithms. 

• Some types of standard particle algorithm, and variants of the APF, resample according to weights 
which depend on two or more historical components of the trajectory of each particle. Such algorithms 
can be incorporated into the framework presented here by a simple state-space augmentation. For 
example, starting from each Markov kernel M'^ on (X, X) (according to which particles are sampled in 
the algorithm of interest) one builds a kernel, M {x,dz) := 6^^{dzi)M'^ {zi, dz2) on (X^, A"^^)^ ^-^^^^ 
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X = {xi,X2),z = (zi,Z2) are points in X^, and introduces the appropriate incremental weight G (x). 
Then the analyses of Section [2] can be repeated with only superficial differences: when M'^ satisfies 
(18 1, then AI satisfies a 2-step version of the same condition, one simply then works on (X^, A"**^) 
instead of (X, X), dealing with the kernel Q (x, dz) := G {x)M {x, dz) instead of Q'^ . 



5 Proofs and auxiliary results 
Proofs for section 12.21 

Before presenting the proof of Proposition [2] it is opportune to observe that for any n > 1: 



(75) 



The formula is validated by noticing that $q = Id, = and when (75 1 holds at rank n. using the 
definition of $5t!+ii composing $^ " on the left of the objects appearing in (75 1 and then using the definition 
of gives the equalities: 

, n = o = O)""" o 1 o = 0)^" o 



SO that the formula (75 1 holds at rank n + 1. A simple inductive arguments then shows that for any n,m > 1 
and any cj G 17, 

(76) 



= $f o 

n+m n ri 



Proof. ( of Proposition Throughout the proof C is a finite constant whose value may change on each 
appearance. 

We first address (|26|) and (28 1. Applying (76) with 9^'^^"^uj in place of uj gives 



SO for any /i G 7^(X) , taking /j,' = <i>^ 



sup sup 

wenpe-p(x) 



(/i) then applying (251 we obtain, for any n,m>l, 



(77) 



Taking ip — Ia for any A ^ X , (77 \ shows that for some fixed and w, "(/i)(A); n > o| is a real- valued 

Cauchy sequence and together with [Grey 2001 Theorem 1] this establishes the existence of r/" G V{X) such 
that (26 1 holds. Moreover, taking m — )■ oo in ( 77 1 we then find that 



sup sup 



it^)-v^ M < MCp 



(78) 



and thus "(m) converges in total variation to 77", uniformly over /i and w. This establishes (28 1 
We next address ( 27 1 and ( 29 ) . We shall establish that 



sup sup 

{ui,x)eQxXne'P{x) 



First note that under (H2), 

satisfies for any u!,Ld' ,x,x' fj,, 
sup 



n+1 



1m)Q^+i(i) 



Qn(l)(^) 

r""(M)Q^(i) 



(79) 



I7l K,{X') n^l <i>r""(A^)Q5^(l) 



g-(l)(a;) $r""'(A.)Q-'(l) 



< 



(80) 
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We have 



Qn+l(l)(^) 



<+r'"(/^)Q^+i(i) *r""(M)Q^(i) 

Q;^+i(i)(a^)<i'f;""(M)Q"(i) - Q^(i)(xX;T^-(/i)Q-+i(i) 



< 



Q-(l)(a:)$r"-(A*)Q^+i(l) - Q^(l)(a;)<i>«;'r'"(Ai)Q^^+i(l 



n+l 



^(M)Q5^+i(i)$r""(M)Q5^(i) 



+ 



Q5^(l)(a;)<i>r"-(A*)Q-(l) 
lA*)Q5^+i(l)-'i>'rr''^(M)Q"+i(l) 



<i>r+"r'"(M)Q^+i(i) 

Q^+i(i)(x) $f;""(M)Q-+i(i) 



-""(A*)g^(i) 



lA^)Qn+l(l) 



o 



if^) Q' "(1) 



where (80 1, (H2) and (25 1 have been appHed. Thus (79 1 is proved and then for any m,n> 1, 

m— 1 



sup 



c 



Thcn\h';:jx);n>l 



I is Cauchy 



q=0 



and real-valued, which is enough to establish the existence oi h : fl xX 



such that (27 1 holds, and furthermore 



sup 



(81) 



which establishes (29 1. The measurability of h stated in part 1) holds as it is the point- wise limit of a 
sequence oi F ® A'-measurable functions. 

Turning now to prove part 3) of the Proposition, note that by ( 78 1 , A^^ = jy^j (G" ) = lim„_j.oo (m) (G"^ ) , 

i.e. A is the point-wise limit of a sequence of measurable functions, and is therefore measurable. The A part 
of (30l holds immediately under (H2), and the h part of (30l holds due to ( [SO] ) and (81 1. 

We now turn to the proof of part 4). We now establish that the triple (r/, /i, A) does indeed satisfy (31 1. 
Firstly, for the measure equation, we have 



1A^)Q"(^) 



$r"'^(M)(G") 



"(^))(^) = <+i ^-(m)(A) 



(82) 



By the strong convergence in (78 1, and the fact that under (H2), sup^ Q^(A){x) < sup^G"^(a;) < oo, the 
left hand side of (82) converges to rj'^ Q'^ {A) / and the right hand side of (82 1 converges to r]^'^{A). Thus r] 
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satisfies the first equation in (31 1. For the second equation in (31 1, choose n> 1 arbitrarily and notice that 



(x) = 



^+^"(A.)g^(i) 



(83) 



(m)Q^(i) 



n+l 



n~l 
n-l 



ri-l 



— 7> ft."(a;) • 1 • A(^, as n — 7^ oo. 



ji-i 



(84) 



where the convergence in (84) is due to (811; (251 applied in conjunction with the property 



n>lx,x' Qn+lWi^' / 



< OO 



which holds under (H2); and Furthermore the l.h.s. of ^ converges to Q"(/i''")(a;) by (|81|, ([801 and 
the dominated convergence theorem. This verifies that the second equation of (31 1 is satisfied. For the third 
equation, we have, for any n > 1, 



< 



sup 



and since we have already proved part 3), which implies sup^ h^{x) < oo, the convergence in ( |78| and (81 1 
show it must be the case that rf{h'^) = 1. 

Now for the uniqueness element of part 4). Suppose that there exists a triple (f],h,Aj of the desired 
nature and such that 



fj'^Q'^ = X^f]'^'^^ Q'^ih"'^) = X^h'^, fj'^ih'^) = 1, for all u € il. 



(85) 



Then integrating the first equation in (85 1, we have A^^ = 'nu>{G'^)i because 77^ is, by hypothesis, a probability 
measure. Thus $'^(^") = t)^^ for any w, so via iteration we find "'^(77^ — and strong convergence 
of (78) then demands that ff^ = rf . Thus fj = t] and therefore A = A. It remains to show that h = h. To 
this end, first note that 



h-{x)r,'"'^{y,) 



^epui 



< 



+ 11^^11 



— 0, as n— > 00 



h^{x) 



(86) 



where the identity 0^=0 ^SPu) — v'^Qni^) = '^'^{v^ "'^)Qni^); which holds due to the measure equation in 
(31 1; then (H2); and then (78 1 and (81 1 have been applied. But, under the hypotheses that Q^ihP'^) 
and ff^{h'^) = 1 for all w, using the already proved fj = rj, , we have the equality 

(/!«"") (a:) 



h'^ix)!]" '^{h' ") = h'^ix) - h'^ix), 
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so taking Lp = h^^'^jTf^'^ih^"'^) in ([86]), and noting that under (H2), 



we find = h'^{x). This completes the proof of part 4), and therefore the proposition. 

Proof. ( of Proposition 



□ 



/^Q^i(i) 



in 

p=i 

n— 1 n— 1 

n <i';^(A*)Q'"'"(i) = n *p(A^)(G'''"), 

p=0 p=0 



and so 



1 



n-l 



log/xQ-(l) = 



Elog AffP^ 



ig[iog [<i>;^(^)(G'' 

p=0 



log A, 



(87) 



Now the A part of (28) ensures that E [|log A|] < oo, so by the ergodic theorem, the first term on the right of 
(87) converges: 



- V log Affp^ ^ E [log A] 



a.s. 



p=0 



For the other term, fix w G fi. Then by (28), for all e > 0, there exists no{e) such that for any uj' £ fl and 
n > no{e), 



and then choosing uj' ~ 6'"w, 
Therefore 



K "(m)(g^ 



-A. 



< e, 



< e. 



log [^-{n)iG' 



logAe^tj —^0, as n — > cxo, 



and so the second term on the right of ( 87 1 converges to zero by Cesaro averaging. The proof is complete 
upon recalling from Proposition [l] that n^^ log/iQJ^(l) — > A, P — a.s., and noting that by Proposition |2] 
Q"(/iS") = X^h'^. □ 

Proofs for section 12.41 

Throughout this section we assume > 1 is fixed arbitrarily. 

For M e M , define _ _ 

R(cj, X, dx') := G(a;, xfip'^ix, x')^M(w, dx'). (88) 

The proofs of Proposition [4] and Theorem [l] involve a generalized eigen- value and eigen-function for R, 
and our first objective is to verify that such quantities exist. In order to do so, we now check that when 
(H2) holds, R satisfies a regularity condition of a similar form. Define 



J(a;, x) 



G(a;, x)20"(x, x'YM{uj, x, dx') 
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and 



so clearly 



,„ G(uj,x)^(l3'^(x,x'fM(uj,x,dx') 
L{uj, x,dx) = — r , 

J{U}, X) 



R(w, X, dx') = J(a;, a;)L(w, x, dx') 



Lemma 6. Assume (112), let M he any member o/M and let v he the accompanying measure in [38). Then 
there exist constants a G [l,oo), ((5_,(5+) G (0, oo)^ and /i G VQi^), such 



•J [UJ , X j 



^{dx) oc 



dv^ 



dv 



-{x) 



v{dx) 



and 



(89) 
(90) 



5^li{-) < h{Lu, X, •) < 5+Ai(-), V (w, x) G X X 



N 



Proof. For any A G A"®^, 



G"(x)2,/)"(x,a;')^M'^(a:,da;') 



= G^{xf 



dM-jx,-) , d^ dV 



M"(x,dx') 



< sup G"'(z)2^i4^ 
(a;',z)enxx« 



dv 



v{dx') < oo, 



where Lemma [T] (H2) and the definition of M have been used. By a similar argument, 

G'^{xf(l}'^ix,x'fM'^{x,dx') 

-dv'^^ 



> inf G" (zY 

((^',z)eJ7xX" 



dv 



z/(c?x ). 



Taking A = in (92) and (93 1, the bound of (89 1 holds with 



a = f3' 



,2N ~3 



The bound of (91) holds with 



fi{dx) oc 



dv 



(x) 



v{dx) 



(91) 



(92) 



(93) 



(5_ 



<5+ = 



inf G"'(z)2^— ^ 



dv 



{x') 



v{dx' 



sup G"'(z)2 ^ 

(w',2)er!xx" 



— ) 
dv 



v{dx') 



□ 
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Remark 9. Having established the regularity properties (89l-(91l, we notice that R will have properties 
which are exactly similar to those properties of Q established in Propositions [T] [2] and [3] which we shall now 
summarize (we will not write a proof explicitly, since the arguments follow precisely the same programme as 
in the proofs of the afore-mentioned Propositions) . 
For any N >l and M G M, 



• there exists a constant <E (—00,00) such that for any ^ G P(X^), 

1 



n 



log^R5^(l) — > ^AT, as n — T' 00, for! 



• there exists a random variable ^ : — > (depending on N) and a function £ : ft x 
is measurable w.r.t. T (X) and such that 



sup - — < 00, 



sup — — < 00, 



and 



for any x G X , 



Sjv =E[loge] = 



x' i{uJ,x') 



e'^{x) 



¥{duj). 



(94) 
which 

(95) 
(96) 
(97) 



Note that in the above displays, the dependence of various quantities on M is suppressed from the notation. 
We can now deal with the proof of Proposition |4] and then a collection of Lemmas which prove Theorem 

ffl 

Proof, (of Proposition^ By Proposition [l] for any /iq G 'P(X), 

2 



and by (94l, 



Then as 



n 
1 



log/ioQ"(l) 



2A, for P - a.a. uj, 



\ogfif''R^{l)^EN, for: 



(98) 



v; 



n,N 



the proof is complete, with Tjv(M) = ^jv — 2A. 



□ 



The proof of Theorem [T] is now given in three Lemmas. The first can be viewed as generalizing the 
necessity part of the proof of [Bucklew et al. 1990 Theorem 3] to the case of non-negative kernels driven by 
an ergodic shift. 

Lemma 7. 1)^2). //TAr(M) = 0, then for F- almost all uj G n there exists G A"^^ such that v®^ [Ai;) = 
and for any x G A^, 



M"{x,B) = 



^gm^{x,dx')\v^'^{x') 
/x„ M"(a;,dz)h»'^(z) 



, for all B ^ X 



Proof. We need to introduce a notational convention before proceeding with the main body of the proof. For 
any </? : £7 x X^ — > M a function measurable w. r.t. F ® , let the P-essential supremum of the collection 
of functions {</?(•, x); a: G X^} (in the sense of 



Doob 



1994 



V, 18.]) be say x 1 i-e. x is a random variable 
on (fi, J^, P). In a slight abuse of our w-section notation we shall write, for any uj in £7, 



ess sup^.^"(x) := Xu. 



(99) 
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Let 7(0;, x) — h(w, x)'^/£{uj, x), let p be the P-essential supremum of the collection of functions {■-f{-,x) ; x G X^} , 
and then in accordance with the convention (99) we shall write: 



ess sup 



h"(x)2 

e'^ix) 



From ( 95 1 ; the definition of h in ( 95 1 and ( 30 ) , 



LO,u,',x.x' -f{L0',x') 



h"(a;)2 r'(x') 
sup ^ , , I — < 00 



(100) 



and therefore, at least up to a set of P-measure zero, is uniformly bounded above and below away from 
zero in w. This observation, the bound (100 1 and Lemma [6] will ensure that various expectations appearing 
below are finite. 

We now proceed with the proof. Since by Proposition [2] and ( 95 ) , A and ^ are uniformly bounded above 



and below away from zero in lo, we may write T7v(M) = — 2A ~ E 



log^ 



as in the above proof of Proposition |^ . We are going to prove that the condition E 
P-almost all w e there exists e X®^ such that v®^ {A^^) = and for any x ^ A, 

~ , , f„M'^(a;,dx')h«"(a;') 
/x« M"(a;,dz)h'*"(z) 

The following holds for P-almost all uj and any x G X^, 

R'^(^^")(x) 



(where the first equality is 
= implies for 



log^ 



(101) 



> 



> 



f(x) 
1 

1 

1 

h'^(a;)2 



/^2 



G"(a;)20'^(x, x')^M'^(a;, da;')h^"(a;') 

G"(x)0"(x,a;')M"(a;,dx')h^"(x') 
n 2 



/Q"(M.')h-(.') 



(102) 



(103) 



where the first equality is just (96 I, the inequality (102 I is due to Jensen's inequality and the equality in 
(1031 holds due to Lemma [2] Therefore the following inequalities hold for P-almost all oj: 



1 1 

iu^ > ess sup —— 

POi^ X i'^{x) 

Xl h"(a;)' 

> ess sup „ , , 

" Peu X i'^ix) 



G"(x)"0"(x, xYM^ix, dx')h'^^{x'f 



(104) 
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Using consecutively tlie inequalities in (1041 

1 1 



E[loge] > / log 
log 

log 
log 



ess sup ^ , 

PBu, X i'^(x 



ess sup 

X 

ess sup - 



1 



> 



log 



1 1 

— ess sup „ , , 

X i'^ix) 

XI h"(x)^ 
— ess sup „ , , 

Pu: X i-^ix) 



G^{xf(f>^{x, x'fM^ix, dx')h.^^{x'f 
G"(a;)2,/)"(x, x'fM'^ix, dx')h^"(x')' 
G'^{xf(t>'^{x,x'fM'^{x,dx')h'^'^{x')^ 
G"(x)20"(a;, x')^M^{x, dx')h.''^{x'f 
V{duj) 



= E[logA2] =E[loge], 



V{dLu) 

log pg^ V{duj) 

P(dw) - E [log p] (105) 
P(dw) 

(106) 
(107) 



where ( 105 ) holds as 9 preserves P; ( |106| holds due to ( 104 1 and because preserves P; and the final equality 
in (1071 holds by hypothesis (since we are trying to prove (101)). Thus we conclude 



E[loge] = / lo; 



1 1 

— ess sup „ , , 

X e^ix) 



log 



1 

— ess sup 

Plj X 



G"{xf^'^{x, x'fM^ix, dx')h^"(a;')' 
R" ([h^'^]^) (x) 



V{duj) 



i'^ix) 



Now for e > and a; G 57, introduce 
Then for P-alniost all lu and all x e X^, 



P{duj) 



^^h-ixf 



(108) 



r(.)>^+.^I.,Jx). 



Pu. 



Now by Proposition|2]and the definition of h, we know inf^ h(cj, x) > 0, and similarly, by ( 95 1 , inf^ j, £{uj, x) > 
0. Combined with Lemma [6] this ensures that there is a strictly positive constant k independent of e and oj 
such that 



_ (^^") jx) ( [h'l 



R"(x,da;')h'^"(a;')' 



> 



R" ([h""]^) (a;) 



N 



peuj£'^{x) peuj 

where ii is as in Lemma [6] Thus we find, once again using the fact that 9 preserves 



E[logC] > / log 



1 R- C 



— ess sup ■ 

Puj X 



i'^ix) 



Pu 



P(dcj), 



but we have already proved (1081, and since e, k and are strictly positive and finite we deduce that 

P({..:M^LJ-0}) =1. 



(109) 



Now for any w G if fJ.{Ag^ J — 0, then by (90 1 [dv®'^ / dv) {x) is zero for v-'a.'a. x € Ag^ ^, and then 
^'^^{Ag^J = 0. Therefore from (IToi) we find 



1 = P ({c : :^^^(AL J = 0}) = P ({a. : .«^(A,.,) = l}) ^ P ({^ : z.«^(A.,) = l}) , 



(110) 
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where the final equahty holds since 9 preserves P. 

From the definition of it is clear that for any w, 



4* 



and 



(111) 



(112) 



Now suppose that for some a; G il, J^(A(j i/,„) = 1 for all m > 1. Then in light of (111 I and (112) , continuity 
of probabihty under v'^^ dictates v®^ (a^) = 1. On the other hand, if v'^^^ {A*J)~^\, then it must be that 
v'^^iA^Mm) = 1 for all to > 1. Therefore 



: v^^iAl) = 1} ^ n ^ -^^(A.,1/™) = 1} 



Evaluating on the sets in (1121 we find 

{lo : i^^^(A„,i/(™+i)) = 1}Q{uj: z.«^(A„,i/„0 = l} , Vm > 1. 
Then by ( |110[ ), ( |113| ), ( |114[ ) and continuity of probability under P, 



(113) 

(114) 
(115) 



Now (115) ensures that, for P-a.a. w, A* is non-empty, so it is legitimate to write, for any x G A*, 

{l^'^) {x) h"(j;)2 



A2 



^"(a;) [Q-(h»-)(a:) 



[Q'^(h»")(a 



G"(x)2/ 0"(x,a;')^M'^(a;,da;')h^"(a;' 



,n2 



(a;)2 0"(a;,a;')M"(x,dx')h''"(a;'; 



> 



Pea 



(116) 



where the final inequality is due to Jensen's inequality. Therefore we have proved 



> 1, 



However, by hypothesis, E 



log# 



E 



log 



PuJ 



— 0, and combined with the fact that 6 preserves P we find 



E 



log 



therefore it must be the case that in fact 



^Lj P8u 

^ij Pw 



1, 



■E[logpe^] -E[logp^] =0, 



a.a. u). 



(117) 



Equation (|117|) implies equality must hold in the instance of Jensen's inequality (116), i.e. for P-almost 

</)"'(a;,a;')h^'^(x') =c"'(a;), M'^ {x, ■) - a.s., 



all uj, and all x G A*, 



and therefore 



1 



M'^{x,dx')h'"^{x'), for any Be A" 
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Normalization then dictates 



and this completes the proof. 



M"(.T,B) = 



JgM'^{x,dx')h'>'^{x') 
M'^(h''")(a;) ' 



□ 



Lemma 8. 2)^3). If for V-almost all uj e fl there exists G X®^ such that = and for any 

X G A^, 



M" .T, B) = r ; , „ , \ ' for all B e 



then sup„ V,"jy < oo. 
Proof. When 



M'^{x,dx') 



M'^{x,dx')W''^{x') 
/x„ M"(x,dz)h«-(z)' 



(118) 



for all uj and x , we have already mentioned that M G M and seen in (44 1 that 

n— 1 1 / t / > \ n— 1 



nG«^-(Cp)/''-(Cp,cp+i) 

p=0 



h"(Co) 



If (118 I holds for some uj,x then we notice that M'^(a;, •) and M"(a;, •) are mutually absolutely continuous, 
because h is strictly positive. Therefore if (1181 holds only for uj in the set of full P-measure where the 
hypothesis of the Lemma holds, and only for x £ A^, then using the assumption that preserves P, we find 
that v®'^- null sets have no contribution to the following expectation: 



MoR^(i) 



-N 



nG^^-(Cp)v^''"(Cp,cp+i)= 

,P=0 

- h-(Co)^ ' 



< 



/n-1 

Ha 

Vp=o / 

/n-1 \ 

Vp=o / 



Lh«""(C«) 



h'^ix) 



sup 



Furthermore, since G"(a:) = A„/i"(a;)/M'^(/i^")(a;), 



(119) 



> 



.p=0 



fn-l \ 
fn-l \ 

JJ^ AgPtj 

^P=0 / 



Combining (119 1 with (120), we arrive at 

MoR^(l) 



sup ■ 



n 



n 



fe"(^o) 

'i»"-(X„) Af»^-^"(/i«''-)(Xp_i) 



inf 



h'^ix) 



< sup 



h'^ix) 



where the final inequality is due to Proposition (|2|. 



< oo, 



(120) 



□ 
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Lemma 9. 3) ^1) If for V-almost all uj, sup„ V^^y < oo, then Tn(M) = 0. 
Proof. Obvious. 



□ 



Proof. ( of Lemma ^ 

We first address 1) =^ 2). By Theorem [l] if Tjv(M) = , then by Theorem [T| it must be the case that 
up to UJ and x being in null sets, 

M^{x, B) = ■'P ^^V' , „ \ ' for all B e X'^^ . 
and therefore using (H2). we find there must exist a random variable, say Xi such that 

forP-a.a. w, h'^ (x) ^ Xu: ■. for i^'^^ - a.a. x. 



Under (H2), the first equation in (31 1 shows that the probability measure rj'^ is equivalent to v and therefore 



for 



a.a. [77"]^"^ (h") = X.- 
But by the third equation in (311, we have [77"]**^ (h'^) = = 1, and it follows that 

for P — a. a. uj, [x) = 1, for v — a.a. x. 



(121) 



Now we show 2) =^ 3). Suppose that (121 1 holds. Then on appropriate sets, ) = \^h^ reduces to 

G^{x) = A„ =: a. 

Finally, we show 3) => 1). Suppose there exists a random variable C : il — > M+ such that 

for P — a.a. w, G'^{x) — C^, for v — a.a. x. 

Then since by hypothesis M"^ — we have V!^'^ = 1 for all n > 1 and for P — a.a. uj. Therefore 
Tjv(M) = lim„_^oo n-^ log V^;''^ with P probabihty 1. This completes the proof. □ 

Proofs for section [3] 

Proof, (of Lemma^ The proof of (50 1 is by induction. At rank n = the inequality holds trivially since 
<i>Q = $Q = Id. Suppose the inequality holds at rank n — 1. Then at rank n. 



< 



< 



+ 1 [<i>'^"""(r5(.,„_i(A.)) - 'f'"'''^iK-iii^))] (^) 



1 - 



rX',„_i(A^)Q'" "(1) 



^%..-MQ'"' "(1) 



and the result holds by the induction hypothesis and some simple manipulations 



The proof of 



51|) is also by induction. At time n — 0, the random variables |Co j . ''^^^ according to 



/iQj SO by the Marcinkiewicz-Zygmund (MZ) inequality. 



-AT 



Vn' 
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Suppose the inequality holds at rank n — 1. Then at rank n, 



-N 



1/p 



< E' 



JV 



1/p 



-N 



(?7^i)-r^' "'^(C-i) M 



1/p 



<-rr"«-i) M 



(122) 

(123) 
(124) 



The term in ( 122 1 is dealt with by application of the MZ inequality and the term in ( 1241 is dealt with using 
([sol. For the term in 



^ N 



(C-i)-r^ "«-i) (^) 



1 r 



1 



1 \ ^^-iQ'" 



1 



e-iQ'""-(i) 



and the result follows by elementary manipulations involving the MZ inequality and the induction hypothesis. 

□ 

Proof. ( of Lemma^Yov a G M let [a] be the integer part of a and let {a} = a— [a] . Consider the decomposition 

(n+l)N 

y^M-'^if^)^ ^ C/f(D 

k=l 

where for any 1 < A: < (n + l)^'^ and (p, i) satisfying k = pN + i, 



( 1 



N 
1 

L v/iV 



/p^(§)-5^^""(ei)(/p^) 

/p^(c;)-'i>^''"^n^^-i)(/p^) 



k-pN =\, 
k - pN = i > 1. 



We will establish distributional convergence of the process 

[Nt]+N 

fe=i 



to a continuous Gaussian martingale by application of Jacod and Shiryaev 2002 Theorem 3.33, p. 478]. 
The dependence on uj of {f^ ), (f^) and various other quantities is suppressed from the notation in 
the remainder of the proof. 

For fixed N, let Hj^ be the cr-algebra generated by the random variables Q for any {p,j) such that 
pN + i<k. Then 

[Nt]+N 



where 



j2 ^Niu^ir'iu.^if'-'m^-i] =cf(r^r^^), 
fc=i 
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" N 



1 \ " 



p ■ 



p=0 



and 



-I [[Nt] -N[t]> 2] 



N 



with the convention $^ '"^(57^1) = ^"^(V-i) = A^o- 
Now for n > consider 



c„(r , /■'") ^ < [(/; - - . 

p=0 

By Corollary [1] for any n > and t £ M+, (and using [Nt]/N t) 
in probability as iV — > cx). Defining 

Ct(r , n := C[,] (r , /^") + {t} (C[,]+i(f , /^") - C[,] (r , /^")) 

we have proved that for any t G M_|_, 

[Nt]+N 

fc=i 

in probability as TV 00. 

We now need to verify the conditional Lindeberg condition is satisfied: we shall prove that for any t £ M+ 
and e > 0, 



[Nt]+N 



E% \U^if^)\ (/^)| > ^ 0, in probability as iV ^ 00. 



(125) 



fc=i 



To this end. fix t e M+. set n — [t] + l and define ||<y5||„ :— yQ<p<n,i<q<m \\'Pp,q\\- Now for any 1 < fc < [Nt]+N 
setting p ^ [{k — 1)/^], we have by hypothesis of the lemma that (3^^ is measurable w.r.t. 'H^_i for any 
q € {1, ...,m}, and therefore 
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Kin\"nK{n\>e]\n^ 



hp,q\\ I 



/ m 
\q=l 



2 r 



L g=l 



2 r 



2|1^I1„E 

9=1 



> eVN 



< 



eiV3/2 



\9=1 



so in turn, 



lNt]+N 



n / ni 



E ^N[\u^in\'nKin\>e]\H^<^{ML) E El/^. 



/c=l 



N I 



p=0 \(j=l 



(126) 



The right hand side of ( 126 1 converges to zero in probabihty as — ^ oo due to ([54| and the continuous 
mapping theorem. This estabhshes (1251, as required. 

Therefore {X^ (f^); t G 1R+} converges in law to a continuous Gaussian martingale {Xt{f); t e M+j such 
that for any t G M+ and 1 < i, j < d, 

The proof is complete upon noting (/^) = VnM^^^^ {f'^). □ 



Proof, (of Theorem^ We first address convergence of the unnormalized measures. Consider the decompo- 
sition 



p=0 



E7p'''(i) 



p=0 



(^^'.i) (tA^"-) 



= E%"'''(i) - (7^-i)l ^fv:^) 



p=0 



(127) 



with the conventions that A^'^ = 1, '"^ (77^1) = V% '"^ (77^1) = ^0, and where 



rU).N . /OWw 
Jp,n ■= yn-p(<P) 



57;-i)Qr-pM 



37 



and 



7^t 



N 



p=0 



N 



p=0 



Corollary [T| and the continuous mapping theorem ensure that 7^'^(1) — > 7"(1) and \fNTZ^'^ in 
probability as — cx). Then by Slutsky's lemma, together with Lemma ([s]) applied to 127 the proof of 
(55 1 is complete. 

Now for the normalized measures. Consider the decomposition: 



N vh-i) 



p=0 



^9" 



where tp — ip ~ rj^lKip) and where 



1 - 



with the convention (^-i) ~ Mo • Lemma [H] provides the desired convergence in distribution of 

Ain'^if), SO by Slutsky's lemma, it remains only to prove that VNTZ^'^{ip) — > in probability. To this 
end, notice that 



< 



E 

n 



N 



-N 



1/2 



-TV 



1/2 



and 
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-N 



^9" 



Qn-piv) - VpQn-piv) 



< E 



N 



%~lQn^p+l{v) 



+ 



< 



Qn-pm 



1/2 



1/2 



%^lQn-p+l{V>) - Vp-lQn-p+li'P) 



1/2 



^p-1 {G^""' 

1 



-AT 



1/2 



Vp-lQn^p+li^) - Ip-lQn-p+liv) 



1/2 



□ 



Then by application of Lemma |4] we conclude that there exists a constant hi^{n) such that 

which implies, via Markov's inequality, that TZ^''^{(p) — > in probability, as required. 

We now turn to the proof of Proposition [5] As per ( 88 ) , we will work with with the non-negative kernel 

R(a;, X, dx') = G{u, xfcji'^ix, x'fM.{u, x, dx'), 
In order to prove Proposition [5] we shall introduce the kernel 

S(a;, X, dx') 



which may be regarded as a randomized similarity transform of R. Then writing, in the usual fashion. 



n > 1, 



(128) 



it is clear that 



[\i'^{x)f 



and we have 

Lemma 10. Assume (HI), (H2) and 



Then for any a; € X 



N 



il}'^(x) 
sup , < oo. 



hm -logS^(l)(a;)= lim - logR:^(l)(x), 

n-^oo Tl n-^oo ft 



Proof. Using the bound sup^ h^{x)/h^ {x') of Proposition [2j we have 



^logS-(l)(x) = ilogR- 



(x)--log [h"(x)] 



< -logR-(l)(x) + -, 
n n 



for some finite constant c which does not depend on u or n. This bound, combined with a similar lower one, 
complete the proof. □ 
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The second component in the proof of Proposition [5] is the following Lemma, whose proof we briefly 
postpone. 

Lemma 11. Assume (H2), 

x,x'' "0" (-^ ) 



and fix LJ E il. Then for any x G and N > 2, 
S'^{l){x 



1 



< 



1 



2 sup ,„ , ,, — 1 



sup 



{z) 



where X^^ is as in Proposition^ 

Proof, (of Proposition^. Let us choose the initial distribution to be the eigen-measure r]'^ defined in 
Proposition [2j Iterative application of the equation rj^Q'^ = X^rj^'^ shows that 



■ji-i 



p=0 



Lp=0 

and then by Lemma 10 Proposition |4] ( |128[ ) and the property that 6 preserves P, we have 

~ 1 2 ""^ 

T^(M) =v-a... lim - log S^(l)(:r) - - V log Xe.^ < log 



p=0 



S"(l)(x) 
ess sup^ \ sup 



Applying Lemma [lT| and noticing sup^gx" M'^(t/>^")(2)/M'^(h''™)(z) < sup^gx the proof is 

complete. □ 



Proof, (of Lemma 11). At various places in the proof we shall write, for some suitable function osc((y9) := 
The starting point is the expression: 



S"(l)(x) 
XI 



- 1 = 



— -VT^M"(-0''")(a;)M M.^{x,dz) ' ' - 1 

M"(a;,dz) 



M-(i^''")(x)G-(a:)2 f ^^^^ ^ y-{zf 



h"(x)2A2 
M}^{h<^^'){xY Jx" 



X" 



- 1 



M'^(a:,dz) - 1 



M'^(x,dz)h^"(z) 
xiv M"(h««')(a;) 



M'^(^^")(a:) h'^"(z) 



- 1 



X" 



M"(x,dz) 



dM'^(a;, •) 



(z)-l 



where ]V['^(h^™) = A^h'"/G'^ has been used, and the final equality, included only for purposes of exposition, 
is valid with M^pt(x,dz) := M"(x, d2)h^"(z)/M"(h^™)(a;). 

The main strategy of the proof is to introduce, for each lo £ ^l, two judiciously chosen Markov kernels, 
if" : X ^ [0, 1] and L" : X^ x X'^^ [0, 1] such that we can write 



m'^{x,dz)\i'^'^{z) 
xiv M'^(h''^")(a;) 



M'^(i/>^")(a;) h^"(z) 



X2 



M"(h««')(a;) 'ip'^'^{z) 
K'^ix, dz^--^) - dz^--'^)\ f^ix, z^), 



(129) 
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where 



M"(h»"')(x) V*"(z) 



- 1 e 



(130) 



and such that we can control the magnitude of (129) usmg an estimate of ||isr"(x, •) — L'^{x, ■)\\f.y Now for 
any u and x, by definition 'Ml^{x, •) is a symmetric measure. For 1 < q < N and x G X^, we shall write 
M"^j(a;, •) for the marginal of M"(x, •) over the first q < N coordinates. 
The first Markov kernel we introduce is: 

N N 



1 M"(a;,rf3)/i''"(3*) (3 ) 



fe=itr-^x"^ M"(h«-)(x) ^f^,^«-(3:') 
where here, and henceforth. 3 = (3^, ...,3^) G X^. Elementary manipulations then yield 

"M'^(-0^")(x) h^"(z) 



if-(xdz-)r(xz^)- / M-{x,dz)h^-iz) 



M"(h»"')(2:) ,/,«'^(z 



where is as in (130) 



The second Markov kernel is: 



L''{x,dz^--^) 



so that then 



Furthermore, we have: 



X2 



M^2)(x,(izi^2)^e.^(^i)^e^(-^2) 
M'^(h»"')(j;)M"(V'*")(a;) 

L'^{x,dz^--^)nx,z^) = 0. 



K"(a;,dz^^^) 

^^Ar M-(a:,d3)fe^-(3'-) V>^-(3fe) 

iv^^ix- M-(h«-)(x) E.li'/^^-la^") ^''"'^ 

1\ (a;,dzi^2);je^(-^i)^e^(^2) 



M'^(h»"')(a;) 



M-^_2)(a:,d3i^^-2) 



XlV-2 ^-1 



> 1- 



1\ M^2)(a;,dzi^2)/je^^(^i)^ea;(^2) 



N-2 



XN-2 



M"(h»'^)(x) 

Then, noting that L'^(a;, •) < ii:"(a;, •), for M'f^^-^{x, •) - almost all z^'-^ £ X 



(131) 



l:2^ 



dK'^ix, 



< 1 



V 



JV-2 



^^-(zi) + <-(z2)+^^«"(3^) 



< 1- 



Nj M'^{xpO'^){x) N 



N-2 



^«-(zi)+/-(z2)+ / ^ V'"(3^')M^^_2)(x,ci3'^^-'^) 



iV- 1 



'tP'"^{z^)+'tP'"^{z') 
M"('0^")(a:;) 



- 1 
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where the first inequaHty uses (131 ) and the second is due to Jensen's inequality. 



Since V^(x, ■) < K^{x, ■) we then have, with A = {z^-^ e : ^ ^i' 



dL^ix,-) , 1:2 



(z-)-l 



< 



< 



TV- 1 



M'^{ip^'^){x) 



K'^{x,dz^--^) 

K'^{x,dz^--^). 



N -I 



The t?;-norm can also be expressed as 



\\K-{x,-)-L-{x,-)\\,^^ sup 

{t^:osc(f6)<l} 



2sup3V/"(3) ^ 
M"(i/)''")(a;) 



[i^"(x,dzi^2)-L'^(a;,dzi^2)] ^(^i:2) 



and combining this with (1291 and (132) we obtain 
S-(l)(x) 



A2 



1 



[if" (a;, _ ^^(^.^ ^^l:2)j ^2) 



< 



< 



1 



TV- 1 
1 

TV- 1 



V z) , 
2 sup -s— - 1 

2 sup , „ ; - 1 



osc(r(a;,.)) 



(132) 



□ 
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